83 research outputs found

    Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions

    Get PDF
    This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed

    Selección de un etiquetador morfosintáctico primando la precisión en las categorías léxicas

    Get PDF
    In this article, four Part-of-Speech (PoS) taggers for Spanish are compared. The evaluation has been carried out without prior training or tuning of the PoS taggers. To allow for a comparison across PoS taggers, their tagsets have been mapped to the universal PoS tagset (Petrov, Das, and McDonald, 2012). The PoS taggers have also been compared as regards the information they provide and how they treat special features of the Spanish language such as verbal clitics and portmanteaux.En este artículo se comparan cuatro etiquetadores morfosintácticos para el español. La evaluación se ha realizado sin entrenamiento ni adaptación previa de los etiquetadores. Para poder realizar la comparación, los etiquetarios se han convertido al etiquetario universal (Petrov, Das, and McDonald, 2012). También se han comparado los etiquetadores en cuanto a la información que facilitan y cómo tratan características intrínsecas del idioma español como los clíticos verbales y las contracciones

    Combining translation memories and syntax-based SMT: experiments with real industrial data

    Get PDF
    One major drawback of using Translation Memories (TMs) in phrase-based Machine Translation (MT) is that only continuous phrases are considered. In contrast, syntax-based MT allows phrasal discontinuity by learning translation rules containing non-terminals. In this paper, we combine a TM with syntax-based MT via sparse features. These features are extracted during decoding based on translation rules and their corresponding patterns in the TM. We have tested this approach by carrying out experiments on real English–Spanish industrial data. Our results show that these TM features significantly improve syntax-based MT. Our final system yields improvements of up to +3.1 BLEU, +1.6 METEOR, and -2.6 TER when compared with a stateof-the-art phrase-based MT system

    The Harvesting Day: an initiative to enhance the visibility of language resources

    Get PDF
    The Harvesting Day es una iniciativa para garantizar la visibilidad, localización y descripción de los recursos lingüísticos mediante un conjunto básico de metadatos. Esta iniciativa aboga por un cambio de estrategia en el que los proveedores de recursos y tecnologías lingüísticos se convierten en responsables de la visibilidad de sus propios recursos así como de su documentación. Una vez creadas y almacenadas debidamente las descripciones de los diferentes recursos, los metadatos son recopilados de manera automática y periódica y se envían a los principales repositorios y catálogos virtuales garantizando así la visibilidad de los recursos así como la veracidad de sus datos, que de este modo se mantendrán actualizados.The Harvesting Day is an initiative to ensure the visibility, accessibility and description of language resources by means of a basic and metadata schema. This initiative believes in a change of strategy: resource and technology providers must be aware of the importance of ensuring the visibility of their resources, as well as the documentation thereof. Once language resources descriptions are appropriately created and saved, the corresponding metadata are automatically and periodically harvested and sent to the main virtual repositories and catalogues. This guarantees not only the visibility of language resources and technologies, but also the trustability of their data, which in turn is continuously updated.Ministerio de Ciencia e Innovación; Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya

    Machine translation as an academic writing aid for medical practitioners

    Get PDF
    In this paper we explore the utility of Machine Translation as a writing aid and its impact on the quality of the text produced. We focus on medical practitioners who are native speakers of Spanish and who need to publish their scientific work in English as a foreign language. After carrying out a general survey to determine whether Spanish-speaking medical practitioners already use MT as a writing aid, we engaged five participants in an experiment where we asked them to write a paper in Spanish that was subsequently machine translated. They were then asked to post-edit the MT output. We analyse their post-edits and further attempt to evaluate the overall quality of their texts by engaging a professional proofreader. Our results suggest that the texts produced with the help of MT+post-editing still require many edits in order to be considered of acceptable quality. In the conclusion, we identify several avenues worthy of future investigation and that could help achieve better quality

    Ethics Recommendations for Crisis Translation Settings

    Get PDF
    This document is a summary public version of the Ethics Recommendations for Crisis Translation Settings produced by some of the INTERACT project team. INTERACT is the International Network in Crisis Translation, a project funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 734211. Further information about the project as a whole is available at: https://sites.google.com/view/crisistranslation/hom

    The first Automatic Translation Memory Cleaning Shared Task

    Get PDF
    This is an accepted manuscript of an article published by Springer in Machine Translation on 21/01/2017, available online: https://doi.org/10.1007/s10590-016-9183-x The accepted version of the publication may differ from the final published version.This paper reports on the organization and results of the rst Automatic Translation Memory Cleaning Shared Task. This shared task is aimed at nding automatic ways of cleaning translation memories (TMs) that have not been properly curated and thus include incorrect translations. As a follow up of the shared task, we also conducted two surveys, one targeting the teams participating in the shared task, and the other one targeting professional translators. While the researchers-oriented survey aimed at gathering information about the opinion of participants on the shared task, the translators-oriented survey aimed to better understand what constitutes a good TM unit and inform decisions that will be taken in future editions of the task. In this paper, we report on the process of data preparation and the evaluation of the automatic systems submitted, as well as on the results of the collected surveys

    1st Shared Task on Automatic Translation Memory Cleaning: Preparation and Lessons Learned

    Get PDF
    This paper summarizes the work done to prepare the first shared task on automatic translation memory cleaning. This shared task aims at finding automatic ways of cleaning TMs that, for some reason, have not been properly curated and include wrong translations. Participants in this task are required to take pairs of source and target segments from TMs and decide whether they are right translations. For this first task three language pairs have been prepared: English/Spanish, English/Italian, and English/German. In this paper, we report on how the shared task was prepared and explain the process of data selection and data annotation, the building of the training and test sets and the implemented baselines for automatic classifiers comparison

    Multiword expressions: Insights from a multi-lingual perspective

    Get PDF
    Multiword expressions (MWEs) are a challenge for both the natural language applications and the linguistic theory because they often defy the application of the machinery developed for free combinations where the default is that the meaning of an utterance can be predicted from its structure. There is a rich body of primarily descriptive work on MWEs for many European languages but comparative work is little. The volume brings together MWE experts to explore the benefits of a multilingual perspective on MWEs. The ten contributions in this volume look at MWEs in Bulgarian, English, French, German, Maori, Modern Greek, Romanian, Serbian, and Spanish. They discuss prominent issues in MWE research such as classification of MWEs, their formal grammatical modeling, and the description of individual MWE types from the point of view of different theoretical frameworks, such as Dependency Grammar, Generative Grammar, Head-driven Phrase Structure Grammar, Lexical Functional Grammar, Lexicon Grammar
    corecore